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We show how to adapt the quasi-Newton method to the electronic-structure calculations using 
systematic basis sets. Our implementation requires less iterations than the conjugate gradient 
method, while the computational cost per iteration is much lower. The memory usage is also 
quite modest, thanks to the efficient representation of the approximate Hessian. 
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§1. Introduction 

The importance of the first-principles electronic-structure calculations based on the density- 
functional theory§'i'i' is increasing year by year.Bi'i) Since the optimization of the ground-state 
wavefunctions is the most time-consuming part of these calculations, it is crucial to use an efficient 
algorithm for this purpose. However, the number of degrees of freedom is so large for systematic 
basis sets like plane-wavespBcP finite-differences, ) and finite-elements,§00@B) that the 
memory usage of the algorithm being used is severely restricted. Currently, the conjugate gradient 
seems to be most widely used because of its efficiency and modest memory 
usage, while the direct inversion in the iterative subspace (DIIS)@@0@il' is also sometimes 
used. 

On the other hand, the quasi-Newton methods have rarely been used for electronic optimization 
in combination with systematic basis sets, although their efficiency is well known;0§S© to the 
best of our knowledge, the application of the quasi-Newton methods in this context has been 
limited to atomic orbitalsHil) or one-dimensional problems.© This is presumably because they 
require significantly more storage for the elements of the (approximate) Hessian matrix. If an all- 
band update is used, the dimension of the Hessian (Ti) is given by Af = N B N G , where N B is the 
number of orbitals and N G is the number of basis functions. Therefore, the storage requirement for 
Ti (Af x TV) will be Af 2 in a naive implementation^ which is prohibitive for large-scale simulations 
where Af can exceed 10 7 . A more practical implementation of the quasi-Newton method is also 



found in the literature^ in which only the m previous steps are relevant. Since two update vectors 
of size J\f are required per step, the memory usage amounts to 2mM elements, where m is usually 
less than 10. However, this can be further reduced to mAf if the initial Hessian is a multiple of the 
unit matrix. In this article, we present the implementation of the quasi-Newton method using 
the BFGS (Broyden-Fletcher-Goldfarb-Shanno) formulall'0 along this line. As explained in the 
next section, we make a number of modifications to adapt the algorithm to the electronic-structure 
calculations. The most important one is the compression of the update vectors by an order of 
magnitude, which makes this algorithm attractive even for very large systems. 

§2. Methods 

2.1 Electronic- structure calculations 

First of all, we explain the basic problems in the electronic-structure calculations within the 
density-functional theoryflB* Only real wavefunctions at the T-point of the Brillouin zone are 
considered for notational simplicity, but generalization to complex wavefunctions is straightforward. 

The total energy functional for an ionic configuration R is given byfi* 

£ tota] [<P, R}=T,J <M r ) [" y2 + y ps[^]] 1>i(r) dr + EH xc [n(r)} + E ion [R], (2.1) 

i 

where 

# = (Vi(r) Mr) - feW) T , (2.2) 

n(r) = J2Mr)\ 2 , (2.3) 

i 

and i?Hxc is the sum of the Hartree and exchange-correlation energy, which is a nonlinear and 
nonlocal functional of the electron density n(r). In practice, each ipi(r) is discretized by a basis 
set expansionpH' which makes \P a huge vector with J\f(= N B N G ) elements. 

In the conventional approach,^ the ground-state energy E G and wavefunctions \P G for the given 
R are obtained by minimization of E tota i[\P , R] with respect to the wavefunctions \P under the 
orthonormality constraints: 

j Vi(r)Vi(r)dr = ay. (2.4) 

\P G calculated in this way is then used to study various properties of the system. 

In our implementation, on the other hand, the above constraints are eliminated by modifying 
the total energy functional according to Refs. []l7,|33,|34, 35], in which orthonormality of the wave- 
functions is satisfied either implicitly0ilii* or automatically & Moreover, all the orbitals are 
updated simultaneously ,11' and self-consistency of En xc is taken into account in the evaluation 
of its gradient. Then, if the modified total energy functional for the given R is denoted by E [&], E G 
and \P G are obtained by minimization of E with respect to ^ without any constraints. Thanks 
to this reformulation, we can easily implement the quasi-Newton method which is one of the most 



efficient algorithms for the unconstrained optimization of nonlinear functions. 11 11 ® Furthermore, 
the use of nonorthogonal basis functions is much easier in this case.@il The above ground-state 
calculations are usually performed for a series of slowly varying R, each of which is called an ionic 
step. 

2.2 BFGS with full Hessian 

We illustrate the conventional quasi-Newton method using the BFGS formulaH© here, which 
will serve as a prototype for the implementation in reduced space. For simplicity, we assume the 



new total energy (eq. (2/7)) is always lower than the previous value. 



Choose TCo and Re- 
calculate E = E [& ] and g = VE \& Q ). 
Set k=0. 

Do while (\g k \ > e) 



Pk = ~n k X 9k (2-5) 

& k+1 = V k + Pk (2.6) 

E k+1 = E [& k+1 ] (2.7) 

g k+1 = VE[* k+1 ] (2.8) 

AW k = & k+1 - & k (2.9) 

Ag k = g k+1 -g k (2.10) 



rik+i = rik tTo , — r . rT (2-HJ 



k = k + l (2.12) 

End do 

While this algorithm is simple and efficient in terms of the convergence rate, its memory usage 
and computational effort scale as 0{J\f 2 ) and 0(A/" 3 ) respectively, which are prohibitive. Although 
the latter can be reduced to 0(J\f 2 ), if the updating formula for the inverse Hessian (H^ 1 ) is used,0 
this is still far from practical. The purpose of this article is to present the improved algorithm!!!! 
in which both scale as 0{M) with modest prefactors. 

2.3 QR- decomposition 

At this point, we give a brief introduction to the QR-decomposition,El which plays an important 
role in the algorithm presented in the next section. Let us assume B{M x r) is a set of linearly 
independent vectors: 

B = (pi p 2 ••• P r ), (2-13) 



where 1 < r <^ Af. Then the QR-decomposition of B is given by 

B = Z T, (2.14) 
where Z(M x r) is a set of orthonormal vectors spanning the same subspace as B, i.e. 

Z T Z = I, (2.15) 

and T(r x r) is an invertible upper-triangular matrix. In practice, this decomposition is obtained 
by applying the addition procedure given below repeatedly, which is (mathematically) equivalent 
to constructing an orthonormal basis from the left (p x ) to the right (p r ) by the Gram-Schmidt 
scheme. Note, however, that only B and T are considered explicitly in the following. 

Here we show how to update the above QR-decomposition when B is slightly modified. In the 
first case where a vector g is added to B, i.e. 

B+ = (Pi p 2 ••• P r 9) = {B g), (2.16) 
the new decomposition is given by 

B + = Z + T + , (2.17) 

where 

T + ((r + l)x(r + l))= I T U I , (2.18) 

V p J 

u = Z T g = (T T )-\B T g), (2.19) 

and 



p= ^\g\ 2 -\u\\ (2.20) 

If p ^ 0, T + is also an invertible upper-triangular matrix. 
Next, we consider the case of dropping the leftmost vector pi from B, i.e. 

B- = (p 2 p 3 ••• Pr ). (2.21) 
The corresponding decomposition is given by 

= Z_T_, (2.22) 

where T_ satisfies 

T T T- = B T _B_. (2.23) 

Obviously, the right-hand side of eq. ( 2,23j ) is included in B T B, which is easily calculated from 

B T B = T T T. (2.24) 

Therefore, T_ is obtained by the Cholesky decompositionB of a small matrix at negligible cost. A 
more refined approach is introduced in Ref. but the above procedure seems to be sufficient for 
our present purpose. 



2.4 BFGS with reduced Hessian and limited memory 

Here we present the state-of-the-art implementation of the quasi-Newton method,!^}© which is 
obtained by modifying the conventional algorithm (§ ^) under two assumptions: (i) T~Lq = al 
(a > 0), and (ii) At most m previous steps are stored. 

In order to fully exploit these conditions, it is more convenient to use a compact representation 
for the Hessian: 

H = Z T HZ, (2.25) 

where Z (J\f x r) is the current (orthonormal) basis, H (r x r) is the reduced Hessian, and 1 < 
r < m + 1 <C J\f. While Z and 7i also appear in the following algorithm, they are not explicitly 
calculated. The reduced vectors are defined in a similar way; the reduced gradient u, for instance, 
is given by it = Z T g, where g = \7E. The correspondence between the full/reduced vectors is 
shown in Table I. 

1. Initilization: 

Set k = and r = 1, where k and r denote the loop index and the rank of the reduced space, 
respectively. 

Choose the initial wavefunction (^o); the approximate curvature (a), the convergence criterion 
(e), and the maximum rank of the reduced space (m). 
Calculate the total energy 

E = E l* ] (2.26) 

and its gradient 

g = VE[¥ ]. (2.27) 

IF (\g \ < e) THEN quit. 

ELSE H = (a), B = (g ), T = (\g \), and v = (|g |). Moreover, Z = (g /\g \) and 
Tio = a I are implicitly assumed. 

If the Hessian of the previous ionic step is taken over, several modifications are required in this 
step, which are, however, straightforward. 

2. Calculate the new search direction in reduced space: 

q k = -H^v k . (2.28) 

3. Calculate the new search direction: 

Pk = Z k q k = BkiT^q,). (2.29) 

4. Update the subspace: (B k = Z k T k — > B' k = Z k T k ), where 

B k (M x r) = (p k _ r+1 ■ ■ ■ p k ^ g k ) (2.30) 



and 

B' k {M x r) = ( Pk _ r+1 ■ ■ ■ Pk ^ Pk ). (2.31) 

Ti is obtained from T k and q k , whereas Z k remains unchanged© 
5. Set a = 1 and calculate the gradient of the total energy along p k as 



E , = d_E [W k + a Pk ] 



= 9kPk = v kQk- ( 2 -32) 

Q=0 



da 

6. Calculate the new wavefunction: 

^fc+i = *k + a Pk . (2.33) 

7. Calculate the new total energy: 

E k+1 = E[¥ k+1 ]. (2.34) 

IF (E k +i > Ek) THEN estimate the optimal a by a parabolic fit with E k , E' , and E k +i, and 
go to |. 

8. Calculate the new gradient: 

g k+1 = VE[* k+1 ]. (2.35) 

IF (|flf fe+1 | <e) THEN quit. 

9. Extend the subspace: (B' k = Z k T k — > B k = Z' k T k ), where 

B' k (M x r) = (p k _ r+1 ••• Pk ) (2.36) 

and 

B'l{M x (r + 1)) = ( Pk _ r+1 ... p k g k+1 ). (2.37) 



As explained in § 2.3, T' k is obtained from T k , u k , and p k +i, where 

u k = Zlg k+l = {T' k T )- l {B' k T g k+l ) (2.38) 

and 



Pk+i = \ \9 k +i\ 2 ~ \ u k\ 2 - (2.39) 



We assume pk+i ^ in the following. Then, the new basis Z' k {M x (r + 1)) is given b> 

Z' k = (Z k z k+1 ), (2.40) 

where z^+i = (g k+ i — Z k u k ) / p k+ \. However, z k+ i is not explicitly calculated. 

10. r = r + 1 

11. Calculate the reduced gradients as 

v'k = Z' k T 9k = ( V * \ (2-41) 



and 



u 



'k = Z k T g k+1 =[ Uk ). (2.42) 

Pk+1 



There is no loss of information here, since g k ,g k+1 G Z' k . 
12. Update the reduced Hessian using the BFGS formulaJH!© 



where 



H'l{r x r) = Z ' k T HtZ' k = H' k - H '^ H * + (2.43) 



s k = Z k T AV k ,«l (2.44) 
y fe = zjAg* = - v' k , (2.45) 



and 



^(r x r) = = ° j . (2.46) 

7^ is defined as the right-hand side of eq. ( 2.1l| ), and eq. ( |2.43| ) is derived from this definition. 



Note that sTy k > is assumed here; otherwise, the Hessian is not updated. 
13. IF (r = m + 1) THEN reduce the subspace: (B' k ' = Z' k T' k ' — ► Sfc+i = Z/ C+1 T/ C+1 ), where 

x (m + 1)) = (p fc _ m+1 P fc _ m+ 2 • • • Pfc fffc+i) (2-47) 

and 

B k+1 (Afxm) = (p fc _ m+2 ••• P fc flf fe+ i). (2.48) 

T/j+i is easily obtained from T^' according to § |2.3| . Then, H k+ i(m x m) = Z^ +1 T-C^ Z k+ i is 
calculated from T k ,T k+ i, and by way of B k+l H k B k+ i. At this point, the new Hessian 
(Ti. k +l) in the new basis {Z k+ \ and its orthogonal complement) is defined as a block-diagonal 
matrix consisting of H k+ \ and al{{j\f — m) x (A/" — m)), which was implicitly used in eq. 
( 2. 46] ). Therefore, part of the information contained in TL k has been discarded here. Similarly, 



v k+ \ = Z k+l g k+1 is calculated from T k , T k+ i, and u' k by way of B k+1 g k+1 , but there is no loss 
of information here. Finally, we set r = m. 

ELSE H k+1 = H k ,B k+1 = B'l,T k+1 = T k , and v k+1 = u' k . Moreover, Z k+1 = Z' k and 

14. k = k + 1 

15. Go to a 



T~i k+ \ = TC k are implicitly assumed. 



• While < k < m — 1, this algorithm is identical to the conventional one (§ ^L^) with TLo = 
al within round-off errors. The two algorithms begin to differ once k reaches m, but the 
deterioration of the convergence rate is minimized by constructing the subspace with the 
previous search directions rather than the gradients .0@ 



For simplicity, the above algorithm includes minimal exception handling. Therefore, the orig- 
inal paper® should be consulted for a more complete one. However, such exceptions are 
observed only in the very early stages of the first ionic step, where the quadratic model is not 
valid. 



One cycle requires approximately 2rJ\f multiply-and-add operations, arising from eqs. ( 2.29 ) 



and ( |2.38 ). For practical values of m (< 10), these costs will be much lower than those of 



evaluating the total energy in step [_ 
The basis functions should be appropriately scaled0'i3) in advance, so that their contribution 
to the total energy is similar. 

The reduced Hessian is diagonalized in each cycle to guarantee its positive definiteness; 



nonpositive eigenvalues, if any, are modified appropriately. Then, it follows from eqs. ( 2.28 ) 
and ( |2.32|) that E' = —v^H^Vk < 0, because H^ 1 is also positive definite and \vj,\ = \g k \> e. 
Furthermore, the average eigenvalue of the reduced Hessian, denoted by is also calculated 
and stored for later use. 

• We explain the choice of a used in step |l] and 12 here. Since a is the approximate curva- 
ture along the new direction,© a reasonable estimate is needed to achieve high performance. 
Therefore, a number of strategies have been proposed to choose optimal crJUEiJi!' most of 
which provide dynamical estimates. Nevertheless, we use a constant a during each ionic step 
unless otherwise noted, which is determined as follows: In the first ionic step, a is estimated 
from the coarse grid iterations© At the end of each ionic step, the sequence is further 
averaged to give the new a for the next ionic step, a obtained in this way varies only slowly 
with ionic steps, while providing stable and high performance in the systems we have studied 
so far. Comparison is also made with the dynamical estimates in § |||. 

2.5 Data compression 

The memory usage of the algorithm illustrated in the previous section is dominated by the m 
previous search directions, which amount to mM elements. While this is much smaller than the 
storage of the full Hessian (= TV 2 ), it is still a serious obstacle in large-scale simulations. In what 
follows, we present a simple algorithm to compress the previous search directions without sacrificing 
the efficiency of the original method. In this algorithm, one search direction is compressed in each 
cycle, by taking advantage of its structure. If p(J\f), which is being compressed, is viewed as a 
two-dimensional array p(N B ,N G ), the size of p(i,j) for a given basis function (j) is expected to 
be similar for all orbitals (i). Based on this idea, the largest element of \p(i, j)\ with respect to i is 
chosen as the scale factor. Moreover, N bit is defined as the number of bits assigned to each element 
of p after compression. 



Then, the scale factor uj (N G ) and the compressed array p 1 (N B , N G ) are given by 

real*8 w (j) = ^max \p /I m „ (2.49) 

and 

integer p. (i,j) = round ( ^^ ) + I max (2.50) 

respectively, where / max = 2 Nbit ~ 1 — 1 and < p Y < 2/ max = 2 Arbit — 2. Therefore, each element 
of p l is representable by N bit bits. The original values of p are recovered approximately by 

p(i,i)«w(i)(Pi(«'»i)-0- (2-51) 

In this method, the quality of the compression can be controlled by a single parameter, N bit . 
Furthermore, the largest element for each j, which is the most important one, remains exact. 

The total storage for the m search directions after compression is mN bit M/8 bytes, if appropriately 
packed with bit operations. If m = N hit = 8, for instance, this amounts to only one double-precision 
array of size N . Note also that the storage for the scale factors is minor. 

In the current implementation, p k is compressed in step [|, when added to B' k . At the same 
time, the last column of T k is calculated directly from the compressed p k (rather than using q k ) to 
maintain the consistency of the QR-decomposition. However, the uncompressed p k is also retained 
and used in step ^. 

Unfortunately, some inconsistency seems inevitable in the update of the reduced Hessian, since 
s k and y k no longer belong to Zi. Nevertheless, E' < remains valid as long as the reduced 
Hessian is positive definite and the latest g and p are uncompressed. Therefore, the stability of 
the minimization is guaranteed even if the previous search directions are highly compressed. 

§3. Results 

As a test of our implementation under realistic conditions, we performed a series of Born- 
Oppenheimer dynamics!!' for bulk diamond at 220 K in a periodic cubic supercell of 64 atoms 
within the local density approximation.BIl' The wavefunctions were expanded by the adaptive 
finite-element method0lll' with an average cutoff energy of 43 Ry, which corresponds to N G = 
8 x 14 3 = 21,952. Since N B is equal to 128, Af amounts to approximately 2,800,000 in this sys- 
tem. The Brillouin zone was sampled only at the T-point, and the separable pseudopotentials were 
used.!!!! The convergence criterion (e) was chosen so that \E k+ i — E k \ ~ 2 x 10 8 Ry/atom 
when |<7fc + i| < e was satisfied. Convergence to the ground state was accelerated by the enhanced 
extrapolation scheme,© which provides accurate initial wavefunctions with the help of population 
analysis. 

The equations of motion for the ions were integrated using the velocity- Verlet methodS 1 with 
a timestep of 80 a.u. (~ 2 fs). Starting from the same ionic configuration, each run lasted for 57 



ionic steps, the last 50 steps of which were used to collect the statistics. Moreover, B,T, and H 
were taken over from previous ionic steps unless otherwise noted. Therefore, these matrices were 
saturated during this period in all runs. 

We first show the average number of iterations (iV itor ) and total energy evaluations (N E ) needed 
to optimize the electronic-structures for the conjugate gradient method using the Polak-Ribiere 
formulaEl' and the quasi-Newton method using the BFGS formula in Table II. The convergence 
rate of the quasi-Newton method as measured by N itor is already comparable to that of the conjugate 
gradient method for m = 2, and becomes better as m is increased. However, there is no point in 
using m much larger than N itor (say, 20), because the Hessian is dominated by the contribution from 
previous ionic steps. In practice, any reasonable choice of m, e.g. 5-8, will provide near-optimal 
performance, since N itoI depends only weakly on m in this range. Note also that the CPU-time is 
more closely related to iV E than N itm . Therefore, the quasi-Newton method was much faster than 
the conjugate gradient method for all m we tried. Specifically, N E = 2iV iter + 1 in the conjugate 
gradient method, because at least one line search was forced to maintain the conjugacy of the 
search directions. In contrast, N E = N itm + 1 in the quasi-Newton method, which means that no 
line search was required in step [|. 

The algorithm presented in § |2.4| has a number of options which are not uniquely determined. 
Therefore, we examine some of them here, as shown in Table III. (a) is the reference run performed 
with m = 7, taken from Table II. (b)-(d) were performed under the same conditions as (a) except 
for the following points: (b) A line search with a parabolic fit was forced in each cycle to see 
if the convergence rate is improved. However, N E was almost doubled without any reduction of 
iV itcr . Therefore, it is not justified to perform a line search in the quasi-Newton method, which is 
consistent with previous findings© (c) The Hessian was discarded at the end of each ionic step. 
Since the convergence rate deteriorates significantly, the inheritance of the Hessian seems to be 
profitable, (d) We tried = \y k \ 2 /y^s^, which gave good results in Refs. [|^,^2|. However, this 
choice requires more iterations on average, presumably because a k varies too rapidly with k. The 
norm of the gradient also decays less smoothly in this case. 

So far the previous search directions have been uncompressed, i.e. stored as 64-bit double- 
precision arrays. The effect of compression is examined here in a series of runs for m = 3 and 7, 
with different JV bit . As shown in Table IV, the performance is maintained after compression by a 
factor of 8-16, especially for m = 7. Moreover, no instability occurred up to N bit = 3. We also show 
the distribution of the search direction after compression with N bit = 8 in Fig. 1. The distribution 
function d(x) is defined as the number of elements of p : such that = x, where 1 <i < M and 
< x < 2 8 — 2 = 254. Therefore, J2 X d(x) = Af, and there are two singularities at x = and 254. 
The width of the distribution is approximately equal to I max , which indicates that our choice of the 
scale factor (eq.( [2.49|) ) is appropriate. 



Finally, in order to examine the generality of our implementation, part of the runs were repeated 
for an isolated cytosine molecule (C4H5N3O) in a cubic supercell of (16 a.u.) 3 , with a timestep 
of 40 a.u. (~ 1 fs). The average cutoff energy was 39 Ry, which corresponds to _/V B = 21, N G = 
8 x 16 3 = 32, 768, and M ~ 700, 000. The results shown in Table V suggest that the performance 
of the quasi-Newton method in this system is somewhat more robust against compression, but is 
qualitatively similar to the previous results in other respects. 

§4. Summary 

We have shown in this article that the quasi-Newton method using the BFGS formula is the 
method of choice for large-scale electronic-structure calculations, if combined with efficient memory 
management. The advantages of the quasi-Newton method over the conjugate gradient method are 
summarized as follows: (i) The Hessian of the previous ionic step can be taken over to accelerate 
the convergence, (ii) Practically no line search is required, which reduces the cost of each step 
significantly. 

Although there is room for fine-tuning the algorithm and more extensive tests are necessary, 
the quasi-Newton method will provide significant speedups of the first-principles codes, together 
with other techniques like the enhanced extrapolation scheme© and the constrained molecular 
dynamics. 
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Table I. Notation for the full/reduced vectors. Note that v and u denote the previous and the current gradients, 
respectively. 





full 


reduced 


gradient 


9 


v, u 


search direction 


P 


Q 


update vectors 


A<i> 


s 




Ag 


V 


wavefunction 







Table II. The performance of the conjugate gradient method and the quasi-Newton method is compared in the 
molecular-dynamics simulations of bulk diamond. iVi tor and N E denote the number of iterations and total energy 
evaluations averaged over 50 ionic steps, respectively. 





method 




N E 


Conjugate 


gradient 


14.3 


29.6 


BFGS, 


m = 2 


14.9 


15.9 




3 


13.8 


14.8 




4 


13.8 


14.8 




5 


12.9 


13.9 




6 


12.4 


13.4 




7 


12.1 


13.1 




8 


11.9 


12.9 




9 


11.7 


12.7 




10 


11.6 


12.6 




20 


12.4 


13.4 



Table III. A number of variants are compared for the BFGS with m = 7. (a) Reference run from Table II. (b) A 
line search with a parabolic fit was forced in each cycle, (c) The Hessian was discarded at the end of each ionic 
step, (d) <jfc = \y k \ 2 /y^Sk was used as the curvature for the new direction. 



method 


AT lter 


iV E 


(a) 


12.1 


13.1 


(b) 


12.2 


25.4 


(c) 


15.1 


16.1 


(d) 


14.3 


15.4 



Table IV. The effect of compression is compared for the BFGS with m = 3 and 7. 



m N hit N it(SI N E 



7 



64 


13.8 


14.8 


8 


14.1 


15.1 


4 


14.3 


15.3 


3 


15.4 


16.4 


64 


12.1 


13.1 


8 


12.2 


13.2 


4 


12.2 


13.2 


3 


13.3 


14.3 



Table V. The results of selected runs from Table II-IV, repeated for an isolated cytosine molecule (C4H5N3O). 



method iVbit N itBI Ni 



Conjugate gradient 




13.1 


27.1 


BFGS, m = 3 


64 


13.0 


14.0 




8 


13.0 


14.0 




4 


13.0 


14.0 




3 


13.3 


14.3 


BFGS, m = 7 


64 


10.7 


11.7 



Fig. 1. The distribution function d(x) of the compressed search direction p : for iVbit = 8. 
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